18 research outputs found
SPOTS: Stable Placement of Objects with Reasoning in Semi-Autonomous Teleoperation Systems
Pick-and-place is one of the fundamental tasks in robotics research. However,
the attention has been mostly focused on the ``pick'' task, leaving the
``place'' task relatively unexplored. In this paper, we address the problem of
placing objects in the context of a teleoperation framework. Particularly, we
focus on two aspects of the place task: stability robustness and contextual
reasonableness of object placements. Our proposed method combines
simulation-driven physical stability verification via real-to-sim and the
semantic reasoning capability of large language models. In other words, given
place context information (e.g., user preferences, object to place, and current
scene information), our proposed method outputs a probability distribution over
the possible placement candidates, considering the robustness and
reasonableness of the place task. Our proposed method is extensively evaluated
in two simulation and one real world environments and we show that our method
can greatly increase the physical plausibility of the placement as well as
contextual soundness while considering user preferences.Comment: 7 page
INSTA-BEEER: Explicit Error Estimation and Refinement for Fast and Accurate Unseen Object Instance Segmentation
Efficient and accurate segmentation of unseen objects is crucial for robotic
manipulation. However, it remains challenging due to over- or
under-segmentation. Although existing refinement methods can enhance the
segmentation quality, they fix only minor boundary errors or are not
sufficiently fast. In this work, we propose INSTAnce Boundary Explicit Error
Estimation and Refinement (INSTA-BEEER), a novel refinement model that allows
for adding and deleting instances and sharpening boundaries. Leveraging an
error-estimation-then-refinement scheme, the model first estimates the
pixel-wise boundary explicit errors: true positive, true negative, false
positive, and false negative pixels of the instance boundary in the initial
segmentation. It then refines the initial segmentation using these error
estimates as guidance. Experiments show that the proposed model significantly
enhances segmentation, achieving state-of-the-art performance. Furthermore,
with a fast runtime (less than 0.1 s), the model consistently improves
performance across various initial segmentation methods, making it highly
suitable for practical robotic applications.Comment: 8 pages, 5 figure
CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents
In this paper, we focus on inferring whether the given user command is clear,
ambiguous, or infeasible in the context of interactive robotic agents utilizing
large language models (LLMs). To tackle this problem, we first present an
uncertainty estimation method for LLMs to classify whether the command is
certain (i.e., clear) or not (i.e., ambiguous or infeasible). Once the command
is classified as uncertain, we further distinguish it between ambiguous or
infeasible commands leveraging LLMs with situational aware context in a
zero-shot manner. For ambiguous commands, we disambiguate the command by
interacting with users via question generation with LLMs. We believe that
proper recognition of the given commands could lead to a decrease in
malfunction and undesired actions of the robot, enhancing the reliability of
interactive robot agents. We present a dataset for robotic situational
awareness, consisting pair of high-level commands, scene descriptions, and
labels of command type (i.e., clear, ambiguous, or infeasible). We validate the
proposed method on the collected dataset, pick-and-place tabletop simulation.
Finally, we demonstrate the proposed approach in real-world human-robot
interaction experiments, i.e., handover scenarios
Nondestructive examination of chemical vapor infiltration of 0°/90° SiC/Nicalon composites
Ph.D.S. R. Stoc
Similarity Graph-Based Camera Tracking for Effective 3D Geometry Reconstruction with Mobile RGB-D Camera
In this paper, we present a novel approach for reconstructing 3D geometry from a stream of images captured by a consumer-grade mobile RGB-D sensor. In contrast to previous real-time online approaches that process each incoming image in acquisition order, we show that applying a carefully selected order of (possibly a subset of) frames for pose estimation enables the performance of robust 3D reconstruction while automatically filtering out error-prone images. Our algorithm first organizes the input frames into a weighted graph called the similarity graph. A maximum spanning tree is then found in the graph, and its traversal determines the frames and their processing order. The basic algorithm is then extended by locally repairing the original spanning tree and merging disconnected tree components, if they exist, as much as possible, enhancing the result of 3D reconstruction. The capability of our method to generate a less error-prone stream from an input RGB-D stream may also be effectively combined with more sophisticated state-of-the-art techniques, which further increases their effectiveness in 3D reconstruction
Semi-Autonomous Teleoperation via Learning Non-Prehensile Manipulation Skills
In this paper, we present a semi-autonomous teleoperation framework for a
pick-and-place task using an RGB-D sensor. In particular, we assume that the
target object is located in a cluttered environment where both prehensile
grasping and non-prehensile manipulation are combined for efficient
teleoperation. A trajectory-based reinforcement learning is utilized for
learning the non-prehensile manipulation to rearrange the objects for enabling
direct grasping. From the depth image of the cluttered environment and the
location of the goal object, the learned policy can provide multiple options of
non-prehensile manipulation to the human operator. We carefully design a reward
function for the rearranging task where the policy is trained in a simulational
environment. Then, the trained policy is transferred to a real-world and
evaluated in a number of real-world experiments with the varying number of
objects where we show that the proposed method outperforms manual keyboard
control in terms of the time duration for the grasping